In the past few years 5% of Cuba’s 11 million person population was encountered at the U.S. border. Once processed, they begin a process towards becoming permanent residents within a year settle in different parts of the United States. Many of these migrants maintain connections to family and friends in Cuba and the remittances that they eventually send back are a significant aspect of the Cuban economy. The purpose of this report is to explore the distribution of the foreign-born Cuban population in the United States. The intention here is to provide context and support for the topic of Cuban migration and remittance landscapes.
The literature review that I performed ahead of this exploratory analysis highlights the importance of remittances to Cuban families The results of investments in the built environment are as clear as the disinvestment. From personal experience, many Cubans work to send money back home in small ways like purchasing minutes and data for a phone line, and in big ways like fronting money for renovation materials, installing a new water tank systems, or purchasing a new home altogether. These processes inform an urbanism unique to Cuba, which is the subject of a larger study that could stem from this project.
I believe that taking stock of the most granular level available through Census data will give interested parties insight into the field of remittance landscapes of Cuba and migratory urbanism. From a remittance economics perspective, the foreign born Cubans moving to the United States constitute the primary agents of remittance flows, they generate the capital with their labor. Usually, when there is more money in the pocket of an immigrant there is more available to remit.
If we can accept that places can function as sorting mechanisms for the settlement of peoples, then we can assume that migrants will settle where their network leads them. For some geogrpahic areas I take a look at where Cubans settle and compare metrics such as the total count of housing units and the gross rent to income ratio. Based on my experience living at the margins in America, the cost of one’s rent relative to their income determines the standard of life that can be had. While doing this research, I put myself in my mother’s shoes and wondered: If I had come to the United States, where could I live and keep more of my income? Where could I potentially access my community and have more money to remit?
Moreover, from a business perspective, the resulting tables and plots offer first steps towards creating geomarketing campaigns intended to connect to the U.S.-based Cuban community. If someone has a business that, for instance, provides remittance services, it can get its message to the right places more efficiently.
Lastly, knowing where there have been changes in the population of foreign-born Cubans in the U.S.is useful information for organizations advocate for the Cuban-American vote and other policy initiatives.
With all these uses in mind, I sought to gain some insight into the changing Cuban-American landscape.
In order to perform the exploratory analysis I will use tidycensus to get data and its associated geometries. I will be operating on three scales: core-based areas, counties, and census tracts. It is important at this point to note that the the Census defines ‘foreign-born’ populations as ‘anyone who is not a U.S. citizen at birth, including those who become U.S. citizens through naturalization.’ Therefore, I do not study the Cuban-American population (born in the US but identify as Cuban heritage).
I compiled a list of variables for 2009 and 2022 ACS 5-Year Estimates. The codes differ between years so they were created from carefully reading load_variable() and filtering the outputs on R Studios interactive view(). All of the metropolitan and micropolitan data uses the 2022 estimates. The county and census tract data calls on both 2009 and 2022 in order to calculate the change over time in foreign-born Cubans.
A metropolitan or micropolitan statistical area contains a core area with a substantial population nucleus, as well as adjacent communities having a high degree of economic and social integration with that core. Core Based Statistical Area (CBSA) term became effective in 2000 and refers collectively to metropolitan and micropolitan statistical areas.Each metropolitan statistical area must have one urbanized area of 50,000 or more inhabitants. Each micropolitan statistical area must have one urban cluster of 10,000 to 49,999 inhabitants
The first Census API call is to get all of the core-based statistical areas across the United States using ACS 5- Year Estimates for 2022 and filter them based the count of foreign-born Cubans. I separate the data into metropolitan and micropolitan areas. I then filter the data frame to create subsets that provide context to the existing presence of Cuban migrants.
Once I have isolated metro and micro statistical areas, as they are both returned in the call. I use filter() on metro areas to get those that have 1000+ foreign born cubans, and 100+ for micro areas. These numbers were chosen arbitrarily Lastly, because the majority of the Cuban migration is related to Florida, I further filtered the USA_metro and USA_micro objects to those that are in Florida and outside of it. I believe this will provide more context to the general story of Cuban Migration which is hyper-focused on Miami.
#get variables in core_based_statistical areas across USA
USA_allcba <- get_acs(geography = "metropolitan statistical area/micropolitan statistical area",
year = 2022,
variables = acs22_vars,
geometry = TRUE,
output = "wide")
#Rename the variables for the 2022 data
USA_allcba <- USA_allcba %>%
rename(total_below_pov=B06012PR_002E,
tot_vacant22 = B25002_003E,
med_inc22 = B21004_001E,
up65_alone22 = B09021_023E,
tot_housingunits22 =B25136_001E,
gross_v_inc_perc22 = B25071_001E,
tot_fb_cuba22 = B05006_143E,
tot_pa_inc22 = B19057_002E,
med_yb22 = B25035_001E,
med_values22 = B25077_001E)%>%
mutate(vacpct22 = (tot_vacant22/tot_housingunits22)) %>%
st_as_sf(crs = crs)
#Begin filtering
USA_micros <- USA_allcba %>%
filter(grepl("Micro", NAME, ignore.case = TRUE)) %>%
dplyr:: select(NAME, GEOID,tot_housingunits22,gross_v_inc_perc22,tot_fb_cuba22,med_yb22, vacpct22) %>%
filter(tot_fb_cuba22 >= 100)%>%
arrange(desc(tot_fb_cuba22))
USA_metros <- USA_allcba %>%
dplyr:: filter(grepl("Metro", NAME, ignore.case=TRUE)) %>%
dplyr:: select(NAME, GEOID,tot_housingunits22,gross_v_inc_perc22,tot_fb_cuba22,med_yb22, vacpct22) %>%
filter(tot_fb_cuba22 >= 1000)%>%
arrange(desc(tot_fb_cuba22))
#create final objects for kable tables
noFL_micros <- USA_micros %>%
filter(!grepl("FL M", NAME, ignore.case = TRUE))
noFL_metros <-USA_metros %>%
filter(!grepl("FL M" , NAME, ignore.case= TRUE))
fl_micros <- USA_micros %>%
filter(grepl("FL M", NAME, ignore.case = TRUE))
fl_metros <-USA_metros %>%
filter(grepl("FL M" , NAME, ignore.case= TRUE))
To get the original county data, an API call was made for all counties in the United States that included the count of foreign born cubans in 2009 and in 2022. The change between 2022 and 2009 was calculated for each county. I filtered the counties as those that saw increases of 500 or more, and those that had decreases 500 or less. I combined the two sets into one object that had 73 counties.
I used the 73 counties to do a iterative get_acs() API call for the tract level using the process in detail below. The end result is, for each of the counties in question, an object of their census-tract level observations of changes to the count of foreign-born Cubans. I visualize this object as one of the main outputs of this project: An interactive map that shows the changing distribution born Cubans in the United States. Again, from the lens of remittance landscapes, these places are in communication with the material world and households in Cuba which benefit from the work of these migrants.
USA_2009 <- get_acs(geography = "county",
year = 2009,
variables = acs09_vars,
geometry = TRUE,
output = "wide")
USA_2009 <- USA_2009 %>%
rename(
fb_placebysex.2009 = B06003_013E,
totalfb_placebycitizenshipstatus.2009 = B05002_013E,
total_carib_1980.2009 = B05007_039E,
total_vacancy_O_status.2009 = B25002_003E,
total_householdtype_relationship.09 = B09016_002E,
med_hh_inc.09 = B19013_001E,
hh_65up.09 = B19037A_053E,
Total_b_100pov.09 = B06012_002E,
gross_v_income_percentage.09 = B25071_001E,
med_built_year.09 = B25035_001E,
tot_publicass_inc.09 = B19057_001E,
tot_housingunits.09 = B25001_001E,
cub_fb_total09 = B05006_127E,
med_houseval09 = B25077_001E
)%>%
mutate(vacancyPct.2009 = total_vacancy_O_status.2009/tot_housingunits.09) %>% # Get Vacanct Rate
st_as_sf(crs = crs)
USA_2022 <- get_acs(geography = "county",
year = 2022,
variables = acs22_vars,
geometry = TRUE,
output = "wide")
USA_2022 <- USA_2022 %>%
rename(
tot_below100pov22 = B06012PR_002E,
tot_vacant22 = B25002_003E,
med_inc22 = B21004_001E,
up65_alone22 = B09021_023E,
tot_housingunits22 =B25136_001E,
gross_v_inc_perc22 = B25071_001E,
tot_fb_cuba22 = B05006_143E,
tot_pa_inc22 = B19057_002E,
med_yb22 = B25035_001E,
med_values22 = B25077_001E)%>%
mutate(vacpct22 = (tot_vacant22/tot_housingunits22)) %>%
st_as_sf(crs = crs)
#Merge the dataframes
USA0922_df <- st_drop_geometry(USA_2022,USA_2009)%>%
left_join(USA_2009 , USA_2022, by= c("GEOID"))%>%
mutate(change_vac_pct = vacpct22 - vacancyPct.2009,
change_med_inc = med_inc22 - med_hh_inc.09,
change_med_values = med_values22 - med_houseval09,
change_count_housingunits= tot_housingunits22-tot_housingunits.09,
change_cuba_fb = tot_fb_cuba22 - cub_fb_total09,
change_pct_below100pov = tot_below100pov22 - Total_b_100pov.09)
#Begin Filtering Process of COUNTIES THAT MEET DOUBLE CRITERIA
##decrease
USA0922_df_filtered_decrease <- USA0922_df %>%
dplyr:: filter(change_cuba_fb < -500)%>%
dplyr:: select(NAME.x, GEOID, geometry,cub_fb_total09, med_inc22,up65_alone22, tot_housingunits22, gross_v_inc_perc22, change_count_housingunits, tot_pa_inc22,med_values22, vacpct22, change_med_values,change_cuba_fb, change_med_inc, change_vac_pct)
##increase
USA0922_df_filtered_increase <- USA0922_df %>%
dplyr:: filter(change_cuba_fb > 500) %>%
dplyr:: select(NAME.x, GEOID, geometry, med_inc22,up65_alone22,cub_fb_total09, tot_housingunits22, gross_v_inc_perc22, change_count_housingunits, tot_pa_inc22,med_values22, vacpct22, change_med_values, change_med_inc, change_vac_pct,change_cuba_fb)
#Combine the two dfs to create the filtered all counties object that will be used for the census tract level selected counties pull
change_fbcuban0922<- rbind(USA0922_df_filtered_decrease, USA0922_df_filtered_increase)%>%
st_as_sf(crs = crs)
Upon analyzing the counties I chose using plus/minus 500 foreign born cubans as the cutoff for both decreasing and increasing changes in foreign born Cuban counts. I created an sf object of counties that between 2009-2022 had either lost or absorbed 500 foreign-born Cubans. I then used this as the input for an iterative call to get their census tract level data.
It is at this level that I desired to create the interactive map below.
It shows the census tract level data for the filtered counties. For additional context, I wanted to add another layer and see the presence of Cubans relative to the boundaries and centers of metropolitan/micropolitan areas.
Processing Name Column to create columns with the County and State Arguments
The chunk below shows the two main things neccesary to transform the ‘Name’ column in county-level get_acs() output to iterate those counties back into a new call for their respective census-tract data.
split_county_state <- function(name) {
# Split the string on the comma to separate county and state
parts <- str_split(name, pattern = ",", n = 2, simplify = TRUE)
county <- trimws(parts[1])
state <- trimws(parts[2])
return(list(county = county, state = state))
}
clean_county_name <- function(county_name) {
# Remove 'County' from the end of the string and any leading/trailing spaces
cleaned_name <- sub("County$", "", county_name)
cleaned_name <- trimws(cleaned_name)
return(cleaned_name)
}
#apply the function to get the split vectors of 'county' and 'state' arguments for the
change_fbcuban0922_split <- change_fbcuban0922 %>%
mutate(
split_data = map(NAME.x, split_county_state),
County = map_chr(split_data, "county"),
State = map_chr(split_data, "state")
) %>%
dplyr:: select(-split_data)%>%
mutate(cleaned_County = sapply(County, clean_county_name))# Remove the list column after extracting components
Function Takes New Columns, Performs API Call, Combines Each output Into One Object
The following chunk of code creates the iterative functions that will take a ‘county’ and ‘state’ column and run a get_acs() tidy census call for both 2009 and 2022. The variables are different each year so they have to be done separately.
iterate_county22 <- function(df) {
# List to store the results
results_list <- list()
# Loop over each row of the dataframe
for (i in 1:nrow(df)) {
# Extract county and state from the current row
County <- df$cleaned_County[i]
State <- df$State[i]
# Try to fetch ACS data, handling errors
acs_data <- tryCatch({
get_acs(
geography = "tract", # Make sure geography is correctly specified
variables = "B05006_143E", # Example variable: Total Population
state = State,
county = County,
year = 2022,
survey = "acs5",
geometry = TRUE
)
}, error = function(e) {
message("Failed for ", County, ", ", State, ": ", e$message)
NULL # Return NULL on failure
})
# Append the fetched data to the list, if not NULL
if (!is.null(acs_data)) {
results_list[[length(results_list) + 1]] <- acs_data
}
}
# Combine all the results into one data frame
final_data <- bind_rows(results_list)
return(final_data)
}
iterate_county09 <- function(df) {
# List to store the results
results_list <- list()
# Loop over each row of the dataframe
for (i in 1:nrow(df)) {
# Extract county and state from the current row
County <- df$cleaned_County[i]
State <- df$State[i]
# Try to fetch ACS data, handling errors
acs_data <- tryCatch({
get_acs(
geography = "tract", # Make sure geography is correctly specified
variables = "B05006_127E", # Example variable: Total Population
state = State,
county = County,
year = 2009,
survey = "acs5",
geometry = TRUE
)
}, error = function(e) {
message("Failed for ", County, ", ", State, ": ", e$message)
NULL # Return NULL on failure
})
# Append the fetched data to the list, if not NULL
if (!is.null(acs_data)) {
results_list[[length(results_list) + 1]] <- acs_data
}
}
# Combine all the results into one data frame
final_data <- bind_rows(results_list)
return(final_data)
}
# FEEL FREE TO RUN 'read_objects_from_file' CHUNK TO SAVE TIME
final_acs09_data <- iterate_county09(change_fbcuban0922_split)
final_acs22_data <- iterate_county22(change_fbcuban0922_split)
Transforming Tract Data and Calculating The Change In Foreign Born Cubans
final_acs09_data<-final_acs09_data %>%
rename(cub_fb_total09 = estimate)
final_acs22_data<- final_acs22_data %>%
rename(tot_fb_cuba22 = estimate)
final_viz_change <- st_drop_geometry(final_acs09_data) %>%
dplyr:: select(GEOID, cub_fb_total09) %>% # Select only the columns needed for computing change
left_join(final_acs22_data, by = "GEOID") %>%
mutate(change_cuba_fb = tot_fb_cuba22 - cub_fb_total09)
#dropNA values in change, won't visualize.
final_viz_change_droppedNA <- final_viz_change[!is.na(final_viz_change$change_cuba_fb), ]
final_viz_change_droppedZ_and_NA <- final_viz_change_droppedNA[(final_viz_change_droppedNA$change_cuba_fb) != 0,]
#Add Geometry back
final_viz_change_droppedNA<- final_viz_change_droppedNA%>%
st_as_sf(sf_column_name='geometry')
final_viz_change_droppedZ_and_NA<- final_viz_change_droppedZ_and_NA%>%
st_as_sf(sf_column_name='geometry')
The following tables show the markets that remained when I applied previously described filters to the metro/micropolitan areas of the United States. I have sorted them firstly using the the highest counts of Foreign-Born Cubans. The Green column highlights the counts, the yellow column compares the gross rent to income ratios. It is also worth noting the counts of total housing units as they speak to the scale of the markets.
Disclaimer: because there are outlier markets, for the count of housing units I chose to go with the median rather than the average.
Table
# Sorting the data
noFL_metros_sorted <- noFL_metros %>%
dplyr::arrange(desc(tot_fb_cuba22))
# Creating the table with kable and kableExtra
noFL_metros_kable <- noFL_metros_sorted %>%
st_drop_geometry()%>%
dplyr:: mutate(Total_Housing_Units=tot_housingunits22, Rent_Income_Ratio22=gross_v_inc_perc22, Total_FB_Cubans=tot_fb_cuba22, Med_Year_Built= med_yb22)%>%
dplyr:: select(NAME, Total_Housing_Units,Rent_Income_Ratio22,Total_FB_Cubans, Med_Year_Built )%>%
kbl(caption = "Metropolitan Areas Outside Florida") %>%
kable_classic(full_width = F, html_font = "Georgia") %>%
kable_styling(
position = "center",
font_size = 12,
) %>%
column_spec(1, bold = TRUE, color = "#ca5733") %>%
column_spec(3, background = '#fbc61d')%>%
column_spec(4, bold = TRUE, background = '#3b914e')
# Print the table to view in an RMarkdown output or similar
noFL_metros_kable
| NAME | Total_Housing_Units | Rent_Income_Ratio22 | Total_FB_Cubans | Med_Year_Built |
|---|---|---|---|---|
| New York-Newark-Jersey City, NY-NJ-PA Metro Area | 7981356 | 31.0 | 61897 | 1959 |
| Houston-The Woodlands-Sugar Land, TX Metro Area | 2760561 | 30.5 | 30963 | 1991 |
| Las Vegas-Henderson-Paradise, NV Metro Area | 923275 | 32.5 | 23902 | 1997 |
| Los Angeles-Long Beach-Anaheim, CA Metro Area | 4730219 | 33.7 | 16932 | 1968 |
| Louisville/Jefferson County, KY-IN Metro Area | 561271 | 27.8 | 13508 | 1976 |
| Dallas-Fort Worth-Arlington, TX Metro Area | 2963281 | 29.7 | 12752 | 1990 |
| Atlanta-Sandy Springs-Alpharetta, GA Metro Area | 2420310 | 30.7 | 8286 | 1993 |
| Phoenix-Mesa-Chandler, AZ Metro Area | 1996937 | 29.8 | 7825 | 1993 |
| Austin-Round Rock-Georgetown, TX Metro Area | 960087 | 29.1 | 6941 | 1999 |
| Chicago-Naperville-Elgin, IL-IN-WI Metro Area | 3942534 | 29.2 | 6462 | 1970 |
| Riverside-San Bernardino-Ontario, CA Metro Area | 1584750 | 34.0 | 4889 | 1986 |
| Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area | 2500311 | 28.9 | 4675 | 1982 |
| New Orleans-Metairie, LA Metro Area | 572691 | 33.3 | 4152 | 1976 |
| Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area | 2590451 | 30.3 | 3664 | 1965 |
| San Antonio-New Braunfels, TX Metro Area | 1015924 | 30.5 | 3526 | 1991 |
| Charlotte-Concord-Gastonia, NC-SC Metro Area | 1115218 | 28.6 | 3380 | 1994 |
| Boston-Cambridge-Newton, MA-NH Metro Area | 2033504 | 29.9 | 2939 | 1963 |
| Nashville-Davidson–Murfreesboro–Franklin, TN Metro Area | 836994 | 29.7 | 2617 | 1992 |
| Kansas City, MO-KS Metro Area | 940968 | 27.7 | 2168 | 1978 |
| San Francisco-Oakland-Berkeley, CA Metro Area | 1851003 | 28.4 | 2073 | 1967 |
| Albuquerque, NM Metro Area | 395967 | 30.8 | 2035 | 1985 |
| Grand Rapids-Kentwood, MI Metro Area | 432268 | 28.9 | 1966 | 1978 |
| Lancaster, PA Metro Area | 216592 | 28.1 | 1917 | 1977 |
| San Diego-Chula Vista-Carlsbad, CA Metro Area | 1230349 | 33.6 | 1844 | 1979 |
| Portland-Vancouver-Hillsboro, OR-WA Metro Area | 1036369 | 30.4 | 1831 | 1983 |
| Rochester, NY Metro Area | 489741 | 30.4 | 1771 | 1966 |
| Seattle-Tacoma-Bellevue, WA Metro Area | 1657075 | 29.2 | 1625 | 1984 |
| Detroit-Warren-Dearborn, MI Metro Area | 1905259 | 29.7 | 1620 | 1968 |
| Virginia Beach-Norfolk-Newport News, VA-NC Metro Area | 761331 | 31.1 | 1547 | 1982 |
| Baltimore-Columbia-Towson, MD Metro Area | 1190378 | 30.2 | 1438 | 1975 |
| Raleigh-Cary, NC Metro Area | 581802 | 28.4 | 1374 | 1998 |
| Denver-Aurora-Lakewood, CO Metro Area | 1245265 | 30.5 | 1348 | 1985 |
| Midland, TX Metro Area | 72376 | 28.0 | 1319 | 1989 |
| Hartford-East Hartford-Middletown, CT Metro Area | 521773 | 30.2 | 1258 | 1967 |
| Minneapolis-St. Paul-Bloomington, MN-WI Metro Area | 1509511 | 28.9 | 1213 | 1980 |
| Grand Island, NE Metro Area | 31716 | 27.3 | 1119 | 1973 |
| Syracuse, NY Metro Area | 296553 | 29.7 | 1086 | 1964 |
| Odessa, TX Metro Area | 66082 | 28.7 | 1067 | 1981 |
| Lansing-East Lansing, MI Metro Area | 236080 | 29.5 | 1024 | 1973 |
| New Haven-Milford, CT Metro Area | 371281 | 31.4 | 1002 | 1964 |
Summary statistics
summary_stats_noflmetros <- data.frame(
Statistic = c("Median Count Total Housing Units", "Average Rent to Income Ratio(Percentage)",
"Median Count of Foreign Born Cubans", "Mean Year Structure Built"),
Value = round(c(1026146, 30.0, 2054, 1980),0
))
# Generate kable table
kable(summary_stats_noflmetros, format = "html", col.names = c("Statistic", "Value"),
caption = "Summary Statistics") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE, background = '#68a6d5')
| Statistic | Value |
|---|---|
| Median Count Total Housing Units | 1026146 |
| Average Rent to Income Ratio(Percentage) | 30 |
| Median Count of Foreign Born Cubans | 2054 |
| Mean Year Structure Built | 1980 |
Table
# Sorting the data
fl_metros_sorted <- fl_metros %>%
dplyr::arrange(desc(tot_fb_cuba22))
# Creating the table with kable and kableExtra
fl_metro_kable <- fl_metros_sorted %>%
st_drop_geometry()%>%
dplyr:: mutate(Total_Housing_Units=tot_housingunits22, Rent_Income_Ratio22=gross_v_inc_perc22, Total_FB_Cubans=tot_fb_cuba22, Med_Year_Built= med_yb22)%>%
dplyr:: select(NAME, Total_Housing_Units,Rent_Income_Ratio22,Total_FB_Cubans, Med_Year_Built )%>%
kableExtra:: kbl(caption = "Florida Metropolitan Areas") %>%
kable_classic(full_width = F, html_font = "Georgia") %>%
kable_styling(
position = "center",
font_size = 12,
) %>%
column_spec(1, bold = TRUE, color = "#ca5733") %>%
column_spec(3, background = '#fbc61d')%>%
column_spec(4, bold = TRUE, background = '#3b914e')
# Print the table to view in an RMarkdown output or similar
fl_metro_kable
| NAME | Total_Housing_Units | Rent_Income_Ratio22 | Total_FB_Cubans | Med_Year_Built |
|---|---|---|---|---|
| Miami-Fort Lauderdale-Pompano Beach, FL Metro Area | 2643202 | 36.8 | 777702 | 1982 |
| Tampa-St. Petersburg-Clearwater, FL Metro Area | 1471328 | 32.4 | 81108 | 1985 |
| Orlando-Kissimmee-Sanford, FL Metro Area | 1094927 | 33.7 | 33255 | 1993 |
| Cape Coral-Fort Myers, FL Metro Area | 419916 | 33.5 | 31798 | 1994 |
| Naples-Marco Island, FL Metro Area | 229814 | 35.9 | 18677 | 1995 |
| Jacksonville, FL Metro Area | 695854 | 31.0 | 8893 | 1990 |
| North Port-Sarasota-Bradenton, FL Metro Area | 462959 | 32.9 | 8672 | 1988 |
| Lakeland-Winter Haven, FL Metro Area | 320023 | 30.8 | 7372 | 1990 |
| Port St. Lucie, FL Metro Area | 231647 | 33.7 | 5223 | 1989 |
| Palm Bay-Melbourne-Titusville, FL Metro Area | 290314 | 32.1 | 3420 | 1987 |
| Deltona-Daytona Beach-Ormond Beach, FL Metro Area | 330161 | 33.4 | 3219 | 1988 |
| Ocala, FL Metro Area | 179079 | 30.0 | 3022 | 1991 |
| Sebring-Avon Park, FL Metro Area | 57605 | 29.9 | 2137 | 1986 |
| Gainesville, FL Metro Area | 152302 | 35.8 | 2122 | 1988 |
| Sebastian-Vero Beach, FL Metro Area | 83801 | 34.4 | 1609 | 1990 |
| Punta Gorda, FL Metro Area | 111330 | 35.4 | 1500 | 1989 |
| Tallahassee, FL Metro Area | 173637 | 33.8 | 1165 | 1988 |
| Pensacola-Ferry Pass-Brent, FL Metro Area | 222708 | 29.9 | 1014 | 1988 |
Summary Statistics
summary_stats_flmetros <- data.frame(
Statistic = c("Median Count Total Housing Units", "Average Rent to Income Ratio(Percentage)",
"Median Count of Foreign Born Cubans", "Mean Year Structure Built"),
Value = round(c(260981, 33.08, 4322, 1989), 0)
)
# Generate kable table
kable(summary_stats_flmetros, format = "html", col.names = c("Statistic", "Value"),
caption = "Summary Statistics") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE, background = '#68a6d5')
| Statistic | Value |
|---|---|
| Median Count Total Housing Units | 260981 |
| Average Rent to Income Ratio(Percentage) | 33 |
| Median Count of Foreign Born Cubans | 4322 |
| Mean Year Structure Built | 1989 |
Table
# Sorting the data
fl_micros_sorted <- fl_micros %>%
dplyr::arrange(desc(tot_fb_cuba22))
# Creating the table with kable and kableExtra
fl_micros_kable <- fl_micros_sorted %>%
st_drop_geometry()%>%
dplyr:: mutate(Total_Housing_Units=tot_housingunits22, Rent_Income_Ratio22=gross_v_inc_perc22, Total_FB_Cubans=tot_fb_cuba22, Med_Year_Built= med_yb22)%>%
dplyr:: select(NAME, Total_Housing_Units,Rent_Income_Ratio22,Total_FB_Cubans, Med_Year_Built )%>%
kbl(caption = "Florida Micropolitan Areas") %>%
kable_classic(full_width = F, html_font = "Georgia") %>%
kable_styling(
position = "center",
font_size = 12,
) %>%
column_spec(1, bold = TRUE, color = "#ca5733") %>%
column_spec(3, background = '#fbc61d')%>%
column_spec(4, bold = TRUE, background = '#3b914e')
# Print the table to view in an RMarkdown output or similar
fl_micros_kable
| NAME | Total_Housing_Units | Rent_Income_Ratio22 | Total_FB_Cubans | Med_Year_Built |
|---|---|---|---|---|
| Key West, FL Micro Area | 54034 | 34.3 | 6868 | 1981 |
| Clewiston, FL Micro Area | 15227 | 34.8 | 2811 | 1987 |
| Okeechobee, FL Micro Area | 18496 | 28.5 | 471 | 1987 |
| Arcadia, FL Micro Area | 15567 | 30.0 | 251 | 1987 |
| Wauchula, FL Micro Area | 9837 | 33.6 | 237 | 1984 |
| Lake City, FL Micro Area | 29835 | 28.5 | 131 | 1991 |
| Palatka, FL Micro Area | 36183 | 30.7 | 127 | 1984 |
Summary Statistics
summary_stats_flmicro <- data.frame(
Statistic = c("Median Count Total Housing Units", "Average Rent to Income Ratio",
"Median Count of Foreign Born Cubans", "Mean Year Structure Built"),
Value = round(c(18496, 31.49, 251, 1986),0)
)
# Generate kable table
kable(summary_stats_flmicro, format = "html", col.names = c("Statistic", "Value"),
caption = "Summary Statistics") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE, background = '#68a6d5')
| Statistic | Value |
|---|---|
| Median Count Total Housing Units | 18496 |
| Average Rent to Income Ratio | 31 |
| Median Count of Foreign Born Cubans | 251 |
| Mean Year Structure Built | 1986 |
Table
# Sorting the data
noFL_micros_sorted <- noFL_micros %>%
dplyr::arrange(desc(tot_fb_cuba22))
# Creating the table with kable and kableExtra
noFL_micros_kable <- noFL_micros_sorted %>%
st_drop_geometry()%>%
dplyr:: mutate(Total_Housing_Units=tot_housingunits22, Rent_Income_Ratio22=gross_v_inc_perc22, Total_FB_Cubans=tot_fb_cuba22, Med_Year_Built= med_yb22)%>%
dplyr:: select(NAME, Total_Housing_Units,Rent_Income_Ratio22,Total_FB_Cubans, Med_Year_Built )%>%
kbl(caption = "Micropolitan Areas Outside of Florida") %>%
kable_classic(full_width = F, html_font = "Georgia") %>%
kable_styling(
position = "center",
font_size = 12,
) %>%
column_spec(1, bold = TRUE, color = "#ca5733") %>%
column_spec(3, background = '#fbc61d')%>%
column_spec(4, bold = TRUE, background = '#3b914e')
# Print the table to view in an RMarkdown output or similar
noFL_micros_kable
| NAME | Total_Housing_Units | Rent_Income_Ratio22 | Total_FB_Cubans | Med_Year_Built |
|---|---|---|---|---|
| Moultrie, GA Micro Area | 19143 | 28.9 | 458 | 1985 |
| Dumas, TX Micro Area | 8184 | 22.3 | 422 | 1973 |
| Columbus, NE Micro Area | 14085 | 26.0 | 412 | 1972 |
| Storm Lake, IA Micro Area | 8197 | 22.2 | 369 | 1962 |
| Hastings, NE Micro Area | 13804 | 27.4 | 361 | 1965 |
| Norfolk, NE Micro Area | 20832 | 24.5 | 352 | 1971 |
| Hobbs, NM Micro Area | 27854 | 26.3 | 290 | 1975 |
| Alamogordo, NM Micro Area | 32244 | 27.9 | 270 | 1984 |
| Hereford, TX Micro Area | 6983 | 20.5 | 249 | 1969 |
| Dodge City, KS Micro Area | 12568 | 21.2 | 246 | 1971 |
| Georgetown, SC Micro Area | 36219 | 30.4 | 199 | 1991 |
| Austin, MN Micro Area | 16933 | 27.3 | 197 | 1957 |
| Shelby, NC Micro Area | 43782 | 30.1 | 168 | 1979 |
| Yankton, SD Micro Area | 10405 | 24.5 | 160 | 1976 |
| Toccoa, GA Micro Area | 12342 | 24.5 | 155 | 1982 |
| Guymon, OK Micro Area | 8448 | 23.3 | 151 | 1974 |
| Richmond-Berea, KY Micro Area | 46221 | 28.3 | 147 | 1990 |
| Douglas, GA Micro Area | 20864 | 25.7 | 133 | 1988 |
| Jasper, IN Micro Area | 24185 | 24.3 | 124 | 1978 |
| Lexington, NE Micro Area | 11049 | 21.9 | 123 | 1967 |
| Hermiston-Pendleton, OR Micro Area | 35988 | 24.6 | 119 | 1977 |
| Ottumwa, IA Micro Area | 15754 | 29.2 | 118 | 1957 |
| Newport, TN Micro Area | 17833 | 28.1 | 117 | 1986 |
| Washington Court House, OH Micro Area | 12685 | 25.0 | 115 | 1970 |
| McMinnville, TN Micro Area | 18164 | 26.9 | 108 | 1978 |
| Seymour, IN Micro Area | 19144 | 26.4 | 103 | 1978 |
| Lumberton, NC Micro Area | 48811 | 29.3 | 100 | 1985 |
Summary Statistics
summary_stats_nofl_micros <- data.frame(
Statistic = c("Median Count Total Housing Units", "Average Rent to Income Ratio(Percentage)",
"Median Count of Foreign Born Cubans", "Mean Year Structure Built"),
Value = round(c(17833, 25.81, 160, 1976),0)
)
# Generate kable table
kable(summary_stats_nofl_micros, format = "html", col.names = c("Statistic", "Value"),
caption = "Summary Statistics") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE, background = '#68a6d5')
| Statistic | Value |
|---|---|
| Median Count Total Housing Units | 17833 |
| Average Rent to Income Ratio(Percentage) | 26 |
| Median Count of Foreign Born Cubans | 160 |
| Mean Year Structure Built | 1976 |
For each of the models that I ran I looked at how the total count of foreign born cubans was impacted by the variables shown above (total number of housing units, gross rent to income ratios, or the median year of structures. The results of the regressions below lack predictive power.Highlighting a need for better feature engineering to predict what markets attract Cuban immigrants.
There were times when the total count of housing units showed up as a significant predictor (p-value < 0.001). But it did not make a noticeable difference in the count of foreign born Cubans per unit of increase. The analysis would benefit from more preparation to catch outliers, and a larger quantity of observations.
Below are some results from the exploratory analysis.
Outside of Florida: Metropolitan Areas Regression
The total Count of Housing Units was a significant predictor until I removed the New York/Newark metropolitan area from the model’s observations. Even then, it did not make significant change in the count per unit of housing increased (less than a person).
Below is a scatter plot showing that the metropolitan areas outside of Florida that Cuban Immmigrants tend to live have less than 3 million housing units. Really, there is a visible concentration below in metropolitan areas with less than 2 million housing units. This can mean that the settlement choices of Cubans are smaller markets than a world city like New York, for instance.
#remove outlier (NY NJ) for better scatterplot of housing units
noFL_metros<- noFL_metros%>%
filter(!str_detect(NAME, 'New York'))
noFL_metro_fit <- lm(tot_fb_cuba22 ~ tot_housingunits22 + gross_v_inc_perc22+ med_yb22, data=noFL_metros)
summary(noFL_metro_fit)
plot(noFL_metros$tot_fb_cuba22, noFL_metros$tot_housingunits22)
Florida: Metropolitan Areas Regression
When it comes to Florida, Tampa and Miami are clear outliers ran the same regression as above. The adjusted R square was .74, suggesting a stronger model with the few variables that we have. Again, strong predictors were housing units (p-value <0.005) and the median year of built structures(p < 0.002).
Below is a scatterplot of the Total count of housing units and the total count of foreign born Cubans.
#filter out Miami for scatterplot
fl_metros<- fl_metros%>%
filter(!str_detect(NAME, c('Miami|Tampa')))
fl_metros_fit <- lm(tot_fb_cuba22 ~ tot_housingunits22 + gross_v_inc_perc22 + med_yb22, data=fl_metros)
summary(fl_metros_fit)
plot(fl_metros$tot_fb_cuba22, fl_metros$tot_housingunits22)
Outside of Florida: Micropolitan Areas
The resulting regression did not return any strong predictors. This could be due to the distribution of the data, seen below. It makes sense given that we are talking about micropolitan areas. My theory is that the decision to settle in these areas is due to factors that would require deeper investigation than one using Census data. They may be more nuanced and network-related. Interviews and oral histories would best shed light on this aspect of the Cuban migration story.
noFl_micro_fit <- lm(tot_fb_cuba22 ~ tot_housingunits22 + gross_v_inc_perc22 + med_yb22, data=noFL_micros)
summary(noFl_micro_fit)
plot(noFL_micros$tot_housingunits22, noFL_micros$tot_fb_cuba22)
Florida: Micropolitan Areas Regression
There are too few observations to make any meaningful statistical inferences for this subset of data. However, like with the micropolitan areas outside of Florida, it would be good to investigate deeper the roots of their Cuban communities.
Florida_micros_fit <- lm(tot_fb_cuba22 ~ tot_housingunits22 + gross_v_inc_perc22 + med_yb22, data=fl_micros)
summary(Florida_micros_fit)
plot(fl_micros$tot_housingunits22, fl_micros$tot_fb_cuba22)
In the final dataset used to generate the interactive map, there are 4549 census tracts that do not have NA or 0 values of change in the count of Cubans between 2009 and 2022.The chunk below attempts to get a spearman and pearson coefficient, -0.46 and 0.10 respectively, for the relationship between counts of cubans in 2009 and the change observed in 2022. At the census tract level, the count of Cubans in 2009 could not by itself predict the changes observed in counts in 2022.
spearman<-cor(final_viz_change_droppedZ_and_NA$cub_fb_total09, final_viz_change_droppedZ_and_NA$change_cuba_fb, method="spearman")
pearson<- cor(final_viz_change_droppedZ_and_NA$cub_fb_total09, final_viz_change_droppedZ_and_NA$change_cuba_fb, method="pearson") #method can be
final_tracts_fit <- lm(change_cuba_fb ~ cub_fb_total09,
data=final_viz_change_droppedZ_and_NA)
summary(final_tracts_fit)
The interactive map below is my favorite part about combining curiousity about a research question and understanding how to leverage computer programming to create a cool visual tool. Below are the census tracts across the United States that show changes to their Count of Foreign Born Cubans between 2009 and 2022. When you click a tract, the value of change will pop up. Another key piece of this interactive map is that it allows us to see the relationship between Cuban migration and the structure of the metropolitan region. When the tracts are outside of urban centers, what does it mean? When they are near city centers, how long have they been there? What kind of neighborhood? Overall, I think this tool can help researchers of Cuban history consider differently the footprint of the Cuban diaspora relative to the centuries of history that defined the U.S-Havana relationship. Especially the waves of migration that have taken place the past 65 years.